Tolerating Branch Predictor Latency on SMT

نویسندگان

  • Ayose Falcón
  • Oliverio J. Santana
  • Alex Ramírez
  • Mateo Valero
چکیده

Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with the branch predictor delay on SMT. Our contribution is two-fold: we describe a decoupled implementation of the SMT fetch unit, and we propose an interthread pipelined branch predictor implementation. These techniques prove to be effective for tolerating the branch predictor access latency. keywords: SMT, branch predictor delay, decoupled fetch, predictor pipelining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A latency-conscious SMT branch prediction architecture

Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because a long-latency operation is being processed, like a memory access or a floatingpoint calculation, the processor can switch to another context so that another thread can take advantage of the idle resources. However, fetch stall conditions cau...

متن کامل

An Effective Bypass Mechanism to Enhance Branch Predictor for SMT Processors

Unlike traditional superscalar processors, Simultaneous Multithreaded processor can explore both instruction level parallelism and thread level parallelism at the same time. With a same fetch width, SMT fetches instructions from a single thread not so deeply as in traditional superscalar processor. Meanwhile, all the instructions from different threads share the same Function Unites in SMT. All...

متن کامل

Evaluating Branch Predictors on an SMT Processor

Simultaneous multithreading (SMT) provides significant increases in microprocessor throughput by issuing instructions from multiple threads per clock cycle. SMT can be realized in a wide-issue superscalar with a modest increase in resources, because much of the hardware is shared among the multiple thread contexts. Branch prediction accuracy, a key component of microprocessor performance, can s...

متن کامل

Neural Branch Prediction

The new neural predictor improves accuracy by combining path and pattern history to overcome limitation inherent to previous predictors. It uses a different prediction algorithm that would allow parallel execution of instructions during every prediction, thereby keeping the latency low. In fact, the fast path-based neural predictor has a latency comparable to the predictors from industrial desi...

متن کامل

Reconsidering Complex Branch Predictors

To sustain instruction throughput rates in more aggressively clocked microarchitectures, microarchitects have incorporated larger and more complex branch predictors into their designs, taking advantage of the increasing numbers of transistors available on a chip. Unfortunately, because of penalties associated with their implementations, the extra accuracy provided by many branch predictors does...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003